AITopics | linear system

Collaborating Authors

linear system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How AI settled the complexity of the oldest SGD algorithm

Dereziński, Michał, Dong, Xiaoyu

arXiv.org Machine LearningJun-30-2026

An essential catalyst for the remarkable breakthroughs in AI that led to the modern large language models (LLMs) such as ChatGPT and Gemini has been the algorithms used to train these models on massive datasets. While the LLM architectures have gotten progressively more complex, the training algorithms have stayed relatively simple, and in fact, they have all been based on the decades-old paradigm of stochastic gradient descent (SGD). The key idea behind SGD is that in order to minimize a certain objective function (such as an LLM's error on the training data), it suffices to access only a noisy estimate of that objective at any given time (e.g., based on a small sample of the data) while making incremental progress towards the solution. This is essential for LLM training, as the datasets have become so massive one could not hope to perform computations on everything all at once. Commonly attributed to a 1951 paper by Robbins and Monro [34], SGD has seen a resurgence of interest over the last 20 years by AI researchers and computer scientists striving to understand its effectiveness, leading to numerous variants and extensions used in modern LLMs [12, 9], most notably the Adam algorithm [25]. As a result, we have gained a robust mathematical understanding of the computational complexity of SGD algorithms in a wide range of settings (e.g., see [11, 15, 5, 17]). Yet, despite this progress there is a surprising gap in the understanding of SGD: The complexity of an algorithm proposed by Stefan Kaczmarz in 1937 [24] for solving a system of linear equations - the oldest published example of an SGD algorithm, which predates Robbins and Monro's paper by over a decade - has not been settled.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2606.29593

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

Neural Information Processing SystemsJun-22-2026, 20:16:02 GMT

Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number of samples in the dataset. We propose an approximate, distributed, accelerated sketch-and-project algorithm (ADASAP) for solving these linear systems, which improves scalability. We use the theory of determinantal point processes to show that the posterior mean induced by sketch-and-project rapidly converges to the true posterior mean. In particular, this yields the first efficient, condition number-free algorithm for estimating the posterior mean along the top spectral basis functions, showing that our approach is principled for GP inference. ADASAPoutperforms state-of-the-art solvers based on conjugate gradient and coordinate descent across several benchmark datasets and a large-scale Bayesian optimization task. Moreover, ADASAPscales to a dataset with > 3 108 samples, a feat which has not been accomplished in the literature.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning

Neural Information Processing SystemsJun-17-2026, 13:52:50 GMT

Matrix preconditioning is a critical technique to accelerate the solution of linear systems, where performance heavily depends on the selection of preconditioning parameters. Traditional parameter selection approaches often define fixed constants for specific scenarios. However, they rely on domain expertise and fail to consider the instance-wise features for individual problems, limiting their performance. In contrast, machine learning (ML) approaches, though promising, are hindered by high inference costs and limited interpretability. To combine the strengths of both approaches, we propose a symbolic discovery framework-namely, Symbolic Matrix Preconditioning (SymMaP)-to learn efficient symbolic expressions for preconditioning parameters. Specifically, we employ a neural network to search the high-dimensional discrete space for expressions that can accurately predict the optimal parameters. The learned expression allows for high inference efficiency and excellent interpretability (expressed in concise symbolic formulas), making it simple and reliable for deployment. Experimental results show that SymMaP consistently outperforms traditional strategies across various benchmarks 1.

artificial intelligence, expression, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AUnifying View of Linear Function Approximation in Off-Policy Reinforcement Learning through Matrix Splitting and Preconditioning

Neural Information Processing SystemsJun-15-2026, 17:51:51 GMT

In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one update, FQI makes an infinite number, and Partial Fitted Q-Iteration (PFQI) performs a finite number. We show that this view is not accurate, and provide a new mathematical perspective under linear value function approximation that unifies these methods as a single iterative method solving the same linear system, but using different matrix splitting schemes and preconditioners. We show that increasing the number of updates under the same target value function, i.e., the target network technique, is a transition from using a constant preconditioner to using a data-feature adaptive preconditioner. This elucidates, for the first time, why TD convergence does not necessarily imply FQI convergence, and establishes tight convergence connections among TD, PFQI, and FQI. Our framework enables sharper theoretical results than previous work and characterization of the convergence conditions for each algorithm, without relying on assumptions about the features (e.g., linear independence). We also provide an encoder-decoder perspective to better understand the convergence conditions of TD, and prove, for the first time, that when a large learning rate doesn't work, trying a smaller one may help. Our framework also leads to the discovery of new crucial conditions on features for convergence, and shows how common assumptions about features influence convergence, e.g., the assumption of linearly independent features can be dropped without compromising the convergence guarantees of stochastic TD in the on-policy setting. This paper is also the first to introduce matrix splitting into the convergence analysis of these algorithms.

linear system, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)

Add feedback

Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs

Neural Information Processing SystemsJun-15-2026, 16:31:12 GMT

The conjugate gradient solver (CG) is a prevalent method for solving symmetric and positive definite linear systems Ax = b, where effective preconditioners are crucial for fast convergence. Traditional preconditioners rely on prescribed algorithms to offer rigorous theoretical guarantees, while limiting their ability to exploit optimization from data. Existing learning-based methods often utilize Graph Neural Networks (GNNs) to improve the performance and speed up the construction. However, their reliance on incomplete factorization leads to significant challenges: the associated triangular solve hinders GPU parallelization in practice, and introduces long-range dependencies which are difficult for GNNs to model. To address these issues, we propose a learning-based method to generate GPU-friendly preconditioners, particularly using GNNs to construct Sparse Approximate Inverse (SPAI) preconditioners, which avoids triangular solves and requires only two matrix-vector products at each CG step.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

Neural Information Processing SystemsJun-14-2026, 04:47:35 GMT

Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number of samples in the dataset. We propose an approximate, distributed, accelerated sketch-and-project algorithm ($\texttt{ADASAP}$) for solving these linear systems, which improves scalability. We use the theory of determinantal point processes to show that the posterior mean induced by sketch-and-project rapidly converges to the true posterior mean. In particular, this yields the first efficient, condition number-free algorithm for estimating the posterior mean along the top spectral basis functions, showing that our approach is principled for GP inference.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Finite Time Adaptive Stabilization of LQ Systems

Faradonbeh, Mohamad Kazem Shirani, Tewari, Ambuj, Michailidis, George

arXiv.org Machine LearningJun-4-2026

Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful. There are only a few existing non-asymptotic results and a full treatment of the problem is not currently available. In this work, leveraging the novel method of random linear feedbacks, we establish high probability guarantees for finite time stabilization. Our results hold for remarkably general settings because we carefully choose a minimal set of assumptions. These include stabilizability of the underlying system and restricting the degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools to address regularity and instability of the closed-loop matrix.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1807.0912

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Affine Tracing: A New Paradigm for Probabilistic Linear Solvers

Hegde, Disha, Pförtner, Marvin, Cockayne, Jon

arXiv.org Machine LearningMay-12-2026

Probabilistic linear solvers (PLSs) return probability distributions that quantify uncertainty due to limited computation in the solution of linear systems. The literature has traditionally distinguished between Bayesian PLSs, which condition a prior on information obtained from projections of the linear system, and probabilistic iterative methods (PIMs), which lift classical iterative solvers to probability space. In this work we show this dichotomy to be false: Bayesian PLSs are a special case of non-stationary affine PIMs. In addition, we prove that any realistic affine PIM is calibrated. These results motivate a focus on (non-stationary) affine PIMs, but their practical adoption has been limited by the significant manual effort required to implement them. To address this, we introduce affine tracing, an algorithmic framework that automatically constructs a PIM from a standard implementation of an affine iterative method by passing symbolic tracers through the computation to build an affine computational graph. We show how this graph can be transformed to compute posterior covariances, and how equality saturation can be used to perform algebraic simplifications required for computation under specific prior choices. We demonstrate the framework by automatically generating a probabilistic multigrid solver and evaluate its performance in the context of Gaussian process approximation.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

2605.10566

Country:

North America > United States (0.93)
Europe (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

fd8872fcba4ba87312cdfe5ebba91ca9-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 10:09:28 GMT

The appendix includes the missing proofs, detailed discussions of some argument in the main body483 and more numerical experiments. We organize the appendix as follows:484 The proof of infeasibility condition (Theorem 3.2) is provided in Section B.485 Explanations on conditions derived in Theorem 3.2 are included in Section C.486 The proof of properties of the proposed model (r)LogSpecT (Proposition 3.4 & 3.6) is given487 in Section D and some additional properties are discussed.488 The truncated Hausdorff distance based proof details of Theorem 4.1 and Corollary 4.4 are489 given in Section E.490 Details of L-ADMM and its convergence analysis are in Section F.491 Additional experiments and discussions on synthetic data are included in Section G.492 Since the linear system (4) has no solution, we know from Farkas' lemma that the following system494 Hence, S is also a solution to (13). However, (13) does not have a solution. We can conclude that504 rSpecT is infeasible in this case.505

artificial intelligence, low-pass parameter, rlogspect, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.91)

Add feedback

Filters

Collaborating Authors

linear system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

How AI settled the complexity of the oldest SGD algorithm

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning

AUnifying View of Linear Function Approximation in Off-Policy Reinforcement Learning through Matrix Splitting and Preconditioning

Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

Finite Time Adaptive Stabilization of LQ Systems

Affine Tracing: A New Paradigm for Probabilistic Linear Solvers

fd8872fcba4ba87312cdfe5ebba91ca9-Supplemental-Conference.pdf

16bda725ae44af3bb9316f416bd13b1b-Supplemental.pdf